Skip to content

[refine](exec) replace std::shared_mutex/std::shared_lock with annotated wrappers for thread safety analysis#63109

Open
Mryange wants to merge 2 commits into
apache:masterfrom
Mryange:AnnotatedSharedMutex
Open

[refine](exec) replace std::shared_mutex/std::shared_lock with annotated wrappers for thread safety analysis#63109
Mryange wants to merge 2 commits into
apache:masterfrom
Mryange:AnnotatedSharedMutex

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 9, 2026

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

After #63070 replaced std::mutex/std::lock_guard with annotated wrappers (AnnotatedMutex/LockGuard), shared mutex usage still relied on raw std::shared_mutex and std::shared_lock/std::unique_lock, which are invisible to Clang's thread safety analysis. This leaves shared-lock sites unverified — GUARDED_BY, REQUIRES_SHARED, and other annotations cannot be enforced.

  • Add AnnotatedSharedMutex (wrapping std::shared_mutex with CAPABILITY/ACQUIRE/RELEASE/ACQUIRE_SHARED/RELEASE_SHARED annotations) and SharedLockGuard (RAII SCOPED_CAPABILITY with ACQUIRE_SHARED/RELEASE) in thread_safety_annotations.h.
  • Migrate VDataStreamMgr and RuntimeFilterMergeControllerEntity from std::shared_mutex to AnnotatedSharedMutex, and from std::unique_lock/std::shared_lock to LockGuard/SharedLockGuard.
  • Add GUARDED_BY annotations to the protected maps.
  • Extract _find_recvr as a private helper annotated with REQUIRES_SHARED(_lock), eliminating the bool acquire_lock parameter that previously bypassed lock tracking.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 9, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated code review summary:\n\nNo blocking issues found in the actual PR diff. The change is focused on adding annotated shared-mutex/shared-lock wrappers and applying them to VDataStreamMgr and RuntimeFilterMergeControllerEntity without changing the effective locking order or heavy-work placement.\n\nCritical checkpoints:\n- Goal/test: The PR goal appears to be enabling Clang thread-safety analysis for shared-mutex users; the code accomplishes this for the touched classes. No test changes are included, but this is a static-analysis annotation/refactor with no intended runtime behavior change.\n- Scope: The modification is small and focused.\n- Concurrency: Existing shared/exclusive lock protection is preserved. VDataStreamMgr keeps receiver-map/set operations under the same mutex, and cancel still copies receivers before doing heavier cancellation. RuntimeFilterMergeControllerEntity preserves the map mutex plus per-filter mutex pattern.\n- Lifecycle/static initialization: No new lifecycle-sensitive globals or static initialization dependencies were introduced.\n- Configuration/compatibility/persistence/data writes: Not applicable.\n- Parallel paths: The two touched shared-mutex users are updated consistently in their declarations and call sites.\n- Tests/results: No runtime test result changes. A compile/static-analysis run in CI should validate the annotation intent.\n- Observability/performance: No new logs or metrics needed; runtime lock behavior remains equivalent to std::shared_mutex/std::shared_lock.\n- User focus: No additional user-provided review focus was specified.

@Mryange Mryange changed the title refine](exec) replace std::shared_mutex/std::shared_lock with annotated wrappers for thread safety analysis [refine](exec) replace std::shared_mutex/std::shared_lock with annotated wrappers for thread safety analysis May 9, 2026
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 9, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29710 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit de25d0859bda847da02dff3760136983d63fd7e6, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17650	3965	3885	3885
q2	q3	10700	873	610	610
q4	4661	460	342	342
q5	7455	1343	1158	1158
q6	189	167	139	139
q7	901	949	760	760
q8	9308	1407	1289	1289
q9	5603	5423	5377	5377
q10	6259	2084	1824	1824
q11	490	273	257	257
q12	622	412	290	290
q13	18100	3366	2747	2747
q14	288	287	263	263
q15	q16	864	874	792	792
q17	979	993	681	681
q18	6451	5779	5631	5631
q19	1176	1271	1073	1073
q20	524	393	260	260
q21	4961	2447	1997	1997
q22	471	424	335	335
Total cold run time: 97652 ms
Total hot run time: 29710 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4770	4755	4826	4755
q2	q3	4758	4819	4199	4199
q4	2141	2177	1394	1394
q5	5018	5045	5324	5045
q6	200	184	144	144
q7	2057	1790	1645	1645
q8	3392	3145	3120	3120
q9	8449	8500	8427	8427
q10	4524	4521	4252	4252
q11	605	436	403	403
q12	733	786	550	550
q13	3342	3655	2953	2953
q14	315	307	272	272
q15	q16	771	921	692	692
q17	1352	1292	1242	1242
q18	8031	7134	7172	7134
q19	1151	1183	1150	1150
q20	2221	2232	1953	1953
q21	6213	5405	4861	4861
q22	548	495	416	416
Total cold run time: 60591 ms
Total hot run time: 54607 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit de25d0859bda847da02dff3760136983d63fd7e6, data reload: false

query5	4298	657	527	527
query6	331	218	199	199
query7	4219	592	301	301
query8	325	243	229	229
query9	8826	3996	4031	3996
query10	451	340	291	291
query11	5768	2421	2197	2197
query12	181	129	123	123
query13	1264	595	415	415
query14	5947	5360	5024	5024
query14_1	4318	4346	4306	4306
query15	209	207	182	182
query16	1023	478	417	417
query17	1006	741	627	627
query18	2460	479	367	367
query19	227	206	177	177
query20	143	136	136	136
query21	214	139	123	123
query22	13584	13905	14578	13905
query23	17351	16561	16268	16268
query23_1	16502	16303	16277	16277
query24	7518	1754	1364	1364
query24_1	1372	1345	1357	1345
query25	606	525	453	453
query26	1301	315	170	170
query27	2736	618	338	338
query28	4462	1996	1935	1935
query29	1008	665	543	543
query30	302	239	202	202
query31	1104	1068	950	950
query32	92	74	72	72
query33	531	363	309	309
query34	1180	1157	671	671
query35	774	785	680	680
query36	1336	1363	1197	1197
query37	154	101	90	90
query38	3191	3114	2983	2983
query39	919	969	906	906
query39_1	893	878	871	871
query40	231	156	134	134
query41	65	60	60	60
query42	110	107	103	103
query43	319	326	283	283
query44	
query45	209	201	192	192
query46	1067	1209	715	715
query47	2371	2348	2176	2176
query48	406	416	305	305
query49	651	529	428	428
query50	690	285	221	221
query51	4322	4235	4247	4235
query52	104	106	93	93
query53	240	282	206	206
query54	314	280	260	260
query55	89	90	85	85
query56	299	293	304	293
query57	1435	1396	1305	1305
query58	297	271	262	262
query59	1631	1642	1453	1453
query60	344	341	331	331
query61	160	159	161	159
query62	673	622	563	563
query63	237	207	204	204
query64	2531	830	691	691
query65	
query66	1768	503	394	394
query67	30052	30056	29799	29799
query68	
query69	460	340	305	305
query70	1034	995	980	980
query71	307	292	275	275
query72	2986	2764	2476	2476
query73	860	793	416	416
query74	5078	4890	4708	4708
query75	2801	2653	2330	2330
query76	2298	1112	725	725
query77	410	409	333	333
query78	12904	12937	12223	12223
query79	1507	1029	736	736
query80	1021	583	489	489
query81	489	279	238	238
query82	1329	162	126	126
query83	351	281	257	257
query84	304	139	114	114
query85	934	535	449	449
query86	436	365	314	314
query87	3414	3346	3206	3206
query88	3546	2716	2635	2635
query89	439	384	338	338
query90	1807	188	186	186
query91	180	173	139	139
query92	85	73	71	71
query93	950	947	555	555
query94	634	369	317	317
query95	661	463	364	364
query96	1024	767	358	358
query97	2666	2684	2593	2593
query98	233	229	230	229
query99	1134	1119	995	995
Total cold run time: 254467 ms
Total hot run time: 170673 ms

// Wraps std::shared_mutex and provides both exclusive and shared capability
// operations so GUARDED_BY / REQUIRES_SHARED / etc. can reference it.
class CAPABILITY("mutex") AnnotatedSharedMutex {
public:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们使用这个封装的API,跟直接使用lock的各种方法是什么区别?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 11, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29583 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c7d22ae640bc29f8446cca7edb2477173eae1ae4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17614	3950	4034	3950
q2	q3	10713	946	619	619
q4	4660	456	341	341
q5	7458	1367	1161	1161
q6	192	168	138	138
q7	900	929	754	754
q8	9311	1447	1307	1307
q9	5662	5405	5307	5307
q10	6256	2087	1824	1824
q11	473	264	261	261
q12	626	416	297	297
q13	18108	3323	2700	2700
q14	294	288	260	260
q15	q16	906	884	791	791
q17	1022	1036	723	723
q18	6569	5717	5518	5518
q19	1197	1216	1055	1055
q20	508	398	274	274
q21	4606	2453	1969	1969
q22	501	395	334	334
Total cold run time: 97576 ms
Total hot run time: 29583 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4769	4791	4864	4791
q2	q3	4713	4810	4228	4228
q4	2151	2251	1476	1476
q5	5018	5074	5370	5074
q6	202	176	140	140
q7	2069	1814	1665	1665
q8	3359	3181	3122	3122
q9	8481	8462	8446	8446
q10	4497	4532	4247	4247
q11	600	421	399	399
q12	717	753	524	524
q13	3275	3647	2944	2944
q14	307	431	292	292
q15	q16	809	810	729	729
q17	1353	1320	1291	1291
q18	8095	7103	7164	7103
q19	1153	1225	1142	1142
q20	2279	2218	1942	1942
q21	6233	5581	4995	4995
q22	543	511	422	422
Total cold run time: 60623 ms
Total hot run time: 54972 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169648 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c7d22ae640bc29f8446cca7edb2477173eae1ae4, data reload: false

query5	4310	648	515	515
query6	342	221	205	205
query7	4301	582	309	309
query8	338	232	224	224
query9	8855	4048	3990	3990
query10	441	356	299	299
query11	5827	2402	2210	2210
query12	189	128	125	125
query13	1278	606	411	411
query14	6036	5365	5016	5016
query14_1	4331	4322	4328	4322
query15	214	206	185	185
query16	1007	498	430	430
query17	1134	748	624	624
query18	2439	489	348	348
query19	220	208	173	173
query20	138	134	131	131
query21	215	135	120	120
query22	13687	13533	13517	13517
query23	17097	16302	16106	16106
query23_1	16202	16185	16177	16177
query24	7579	1735	1320	1320
query24_1	1337	1325	1357	1325
query25	571	530	482	482
query26	1345	334	184	184
query27	2672	589	347	347
query28	4413	1957	1947	1947
query29	1056	670	550	550
query30	307	246	203	203
query31	1132	1064	945	945
query32	90	77	75	75
query33	558	366	321	321
query34	1162	1126	638	638
query35	776	794	668	668
query36	1325	1367	1162	1162
query37	157	106	92	92
query38	3218	3151	3061	3061
query39	940	933	926	926
query39_1	878	903	894	894
query40	243	163	143	143
query41	70	68	66	66
query42	113	114	114	114
query43	331	335	291	291
query44	
query45	217	210	200	200
query46	1099	1194	731	731
query47	2360	2381	2216	2216
query48	384	404	299	299
query49	652	548	460	460
query50	708	303	224	224
query51	4249	4279	4150	4150
query52	105	106	98	98
query53	260	292	219	219
query54	332	292	296	292
query55	96	90	84	84
query56	315	308	297	297
query57	1425	1422	1330	1330
query58	300	274	269	269
query59	1555	1666	1444	1444
query60	359	335	330	330
query61	160	149	157	149
query62	668	629	564	564
query63	243	198	204	198
query64	2440	826	658	658
query65	
query66	1745	522	395	395
query67	30003	29398	29764	29398
query68	
query69	459	362	293	293
query70	993	1011	955	955
query71	312	273	275	273
query72	2979	2760	2386	2386
query73	842	777	422	422
query74	5064	4879	4723	4723
query75	2781	2657	2354	2354
query76	2290	1133	753	753
query77	445	447	348	348
query78	12800	12989	12187	12187
query79	1499	977	748	748
query80	686	601	488	488
query81	465	282	254	254
query82	1365	161	123	123
query83	351	275	252	252
query84	268	146	108	108
query85	883	514	464	464
query86	402	327	321	321
query87	3465	3386	3204	3204
query88	3533	2670	2643	2643
query89	443	384	342	342
query90	1902	183	183	183
query91	181	172	141	141
query92	79	86	73	73
query93	951	975	560	560
query94	549	364	291	291
query95	681	464	354	354
query96	1013	748	340	340
query97	2726	2688	2577	2577
query98	237	229	228	228
query99	1144	1124	981	981
Total cold run time: 253309 ms
Total hot run time: 169648 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.33% (30/36) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27842/37768)
Line Coverage 57.57% (301009/522822)
Region Coverage 54.79% (250866/457862)
Branch Coverage 56.33% (108472/192558)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants